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ABSTRACT The management of large events with hundreds of thousands of individuals 
has remained a challenge over the years. Crushes and stampedes occurring in the events 
of mass gathering have swallowed many valuable lives around the world. Considering the 
substantial advancement in positional tracking, wearable technology, and wireless 
communication, many event organizers are embracing the use of these technologies to get 
assistance in managing large events. Intelligent monitoring of crowd movement and 
timely analysis of evolving conditions may aid in early detection of critical situations. 
The current research aims to propose a big data resource framework to model, simulate, 
and visualize the crowd conditions for actual venue settings. A distributed framework has 
been presented to monitor the movement and interaction of individuals in large crowded 
events through localized sensing and geospatial analysis of massive positional data. The 
pilgrimage (Hajj) has been considered as a case study for demonstrating the effectiveness 
of the proposed framework. The proposed framework has been with the help of synthetic 
data that covered some useful and frequent scenarios based on the case study of 
pilgrimage (hajj), which is an annual event involving more than a million people. 


INDEX TERMS agent-based modeling, big data, crowd simulation, crowd analytics, 
crowd visualization, multi-agent System 


I. INTRODUCTION strong chance that people may get 


Crowd management in large events with 
hundreds of thousands or millions of 
individuals is a critical challenge for 
authorities. For instance, physical 
management and monitoring of individuals 
in large events, such as the Olympics, 
Hajj, political gatherings, and live concerts 
is very tedious. The substandard route and 
traffic management in such large events 
may result into congestion which, in turn, 
creates panic among individuals and 
causes stampedes to occur. Such 
stampedes have been a major cause of 
accidents resulting into injuries and deaths 
of people. Furthermore, in the large scale 
events that span for multiple days, for 
instance Hajj and Olympics, there is a 


disconnected from their main group. 
Particularly, it is a common issue in case 
of Hajj where most of the pilgrims are old 
and are not accustomed to use cell phones. 
Technology has been used in many 
different manifestations to manage these 
large crowds. For instance, computer 
vision based techniques have been used to 
monitor the mobility and behaviors of 
crowds. Despite the substantial 
advancement in this field, the methods 
have turned out to be unproductive when 
applied at a large scale due to viewing 
angles and ambiguous detection 
limitations. 
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On the other hand, the evolving trend of 
smart or context-aware environments has 
led to the proposition of geographically 
informative systems. In these systems, 
advanced tracking technologies are used to 
collect quantitative movement data. 
Individuals within a crowd are observed as 
profiled users. Global Positioning System 
(GPS) localization [1] and proximity based 
[2] tracking has been used to capture 
complex crowd dynamics during an event. 
Although, implementing and evaluating 
such systems yet adhere to many 
challenges, such as gathering and 
processing the data from heterogeneous 
sources and the expense of equipping the 
individuals or places with sensors. 


Simulation serves as a means to design 
problematic scenarios and the mechanism 
in order to evaluate the behavior and 
operation of the opted solution or system. 
Crowd simulation, in particular, attempts 
to give an appropriate abstraction of the 
domain as mentioned above. For the 
realistic simulation of such events, the 
Multi-Agent System (MAS) approach is 
considered suitable as the agents are 
expected to move to their goals, interact 
with their environment, and respond to 
each other. MAS is also postulated as a 
preferred approach for emergency 
evacuation [3] because MAS models 
problems in terms of autonomous 
interacting component agents, proving to 
be a more natural way of simulating the 
un-predictability factor of large crowds. 


It is pertinent to mention that crowd 
simulation methods for controlling 
individual agents in high-density 
crowd,[4] interactive virtual environments, 
[5] and position-based [6] solutions have 
already been proposed. The current study, 
however, distinguishes itself on 
stipulations related to a systematic analysis 
of the movement of crowds and its 
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visualization in large events. The 
continuous tracking of individuals at a 
massive scale can be done more efficiently 
through a localized distributed sensing 
infrastructure along with the existing 
support of GPS. 


The current research presents a big data 
resource framework that along with the 
simulation of a physical sensing network 
includes a parallel computing 
infrastructure to process large sets of 
collected positional data for real-time 
analytics to provide a crowd-monitoring 
platform for the authorities in any 
situation. The proposed framework has 
been evaluated by simulating some useful 
scenarios from a case study of Hajj. 


The rest of the study has been organized as 
follows. Next section discusses the 
relevant literature. Some preliminary 
information has been presented in the 
subsequent section which defines the 
proposed framework. The next section 
provides complete details of the proposed 
framework and its relevance to one of the 
frequently organized large-scale event, 
Hajj. Experiments conducted, for some 
useful scenarios, involving spatio-temporal 
analysis of the positional data generated by 
the simulation for some useful and 
frequently executable scenarios in Hajj, 
have been presented in the next section. 
While, the last section concludes the 
research. 


Il. RELATED WORK 


Managing and monitoring of large crowded 
events has been discussed by many 
researchers. Some the studies have 
proposed multi-agent systems for crowd 
simulation and management. While, others 
have worked on managing crowd based on 
the capacity of a venue with the help of 
video data processing and other with the 
help of multi-agent systems. This section 
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discusses many different studies which 
focused on this problem in many different 
ways. 


A framework for the simulation of 
cooperative tasks, performed by agents, was 
proposed by [7]. The proposed solution was 
supportive of four basic actions based on 
real-life scenarios. Managing workers’ 
tasks, avoiding collision, state, and 
movement of workers was also managed by 
this framework. They presented an agent- 
based model to identify the factors which 
create panic among people. Additionally, 
they presented a way for evacuating large 
crowds during an unfortunate event [8]. 


A data-driven learning technique was 
proposed for representing the information 
pertaining to different scenarios of crowd 
simulation. A method based on the low 
dimensional crowd space technique was 
used to analyze crowd simulation accuracy 


[9]. 


Some other researchers also developed 
agent-based simulation techniques by 
extracting the feature of the crowd from a 
video. For instance, [10] used global path 
planning and collision avoidance strategies 
for evacuating the crowd. A multi agent- 
based model presented for evacuating the 
crowd during a violent attack [11]. While, 
[12] proposed a framework based on multi- 
agent reinforcement learning to detect 
congestion in the crowd. Another agent- 
based crowd simulation framework hase 
been proposed by [13] that employed 
AnyLogic libraries for the simulation of a 
crowd in runtime environment to simulate 


emergency evacuation strategies. The 
proposed framework based on analytical 
simulation environment. A Cluster 


Verification Model (CVM) proposed by 
using a Wireless Sensor Network (WSN) to 
solve the aforementioned problem for 
pandemic cases by using single cluster 


approach (SCA) and split cluster approach 
(SpCA) [14]. 


An interesting work on managing a massive 
crowd according to the capacity of the place 
by counting the number of persons getting 
in and out from the specific area by using 
WSN has been presented by [15]. Similarly, 
[16] proposed a WSN architecture to find 
an efficient solution in order to provide 
food items to the massive crowd. The 
supply of food conducted automatically in 
cluster form according to a given limit of 
time to avoid any food shortage. A real- 
time based crowd simulation solution 
proposed by authors. The presented model 
based on the integration of the potential 
field method and agent-based method. 
Different fields are used to calculate the 
real-time interactions and prevent collision 
among agents [17]. Another relevant and 
interesting study discussed the self 
evacuation of passengers during panic and 
its solution. For modeling of information, 
the spread ripple effect rule is used, a 
hypothesis is used to model individuals’ 
behavior and decision-making during self- 
evaluation [18]. 


Another mobile phone-based crowd 
monitoring framework was proposed by 
[19] by using Wi-Fi and Bluetooth readings 
to estimate the crowd and claim their 
solution is the better and low cost. While, 
an automated surveillance system by using 
a computational object recognition system 
with a video stream was proposed by [20]. 
This system automatically tracks and 
identifies the individuals in the crowd by 
using CNN. Similarly, another framework 
named ‘crowd Probe’ was proposed to 
monitor the crowd in indoor settings. It 
utilizes hidden Markov model for extracting 
trajectories of individuals by using Wi-Fi 
monitor [21]. 
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Another relevant work on the dynamic 
positions and events to manage the large 
crowd according to the capacity to avoid 
causalities was proposed by [22]. They 
studied a smart queue approach at various 
entry points to reduce the waiting time in 
the system as a function of the number of 
multiple entry/exit loops, social distancing, 
and arrival rates of pilgrims [23]. The 
authors designed a WSN based 
identification model by using grouping 
techniques and different operational phases 
to manage the crowd. Optimization of 
crowd monitoring discussed by authors: a 
solution for monitoring of crowd is 
proposed based on  Wi-Fi/Bluetooth 
interface. Firstly, a large number of datasets 
are prepared and then localization and 
filtration are performed to estimate crowd 
density [24]. In [25], the authors proposed a 
real-time surveillance-based system namely 
smartiSS. The proposed system worked on 
the basis of monitoring Mac ids of 
individual devices. 


A spatio-temporal based visualization of 
crowd movement, by using large-scale data 
in the form of a transition graph, was 
proposed [26]. A behavior analysis 
approach based on using generative model 
as Hidden Markov model to help crowd 
managers in order to make good decisions 
in invoking  Internet-of-Things (loT) 
services. The proposed approach based on 
spatio-temporal flow-blocks for 
marginalization of arbitrarily dense flow 
field [27]. The researchers studied about 
adoption and utilization of fences to guide 
crowd movement. They focused on crowd 
regarding the optimization of the fence 
layout for efficient crowd management by 
using simulation model [28]. The 
researchers designed a smart image 
enhancement and quality control system for 


The proposed model is developed by using 
Unity 3D. A video of persons walking and 
interacting is used for simulation and the 
results are presented by using visualization 


techniques [30]. A framework named 
‘icrowd’ proposed by authors for 


monitoring and tracking the individuals. 
The proposed framework comprised of 
three layers and a real-time view of 
individuals is presented by this framework 
using real-time location of individuals [31]. 
In another effort, the authors presented a 
method to link the physics-based 
animations and multi-agent modeling to 
improve the efficiency of crowd simulation 


[32]. 


An improved version of the multi-agent 
reinforcement learning algorithm for 
learning is done on extracted information 
from videos [33]. For simulation of crowd, 
an algorithm was proposed to utilize the 
trajectories of pedestrians from videos and 
simulate these in the form of agents in 
simulation [34]. Similarly, a solution for 
simulation of a large number of agents 
based on nonvolatile ram (NVRAM) and 
agent based architecture for computing the 
simulation at a large scale was proposed by 
[35]. An agents based simulation 
application was proposed by [36]. The 
proposed method comprised of three 
modules including simulation environment 
for simulating the agents, agent model, and 
output for visualizing the simulated results. 
Mass-motion was used to optimize the 
crowd flow rate with density restricted to a 
safe threshold value for efficient crowd 
management. A robust regression model 
was developed to guide the authorities for 
the safe and efficient operation of the 
visiting corridor [37]. For the prediction of 
stampede in large crowd, authors proposed 
an architecture by using mobile sensing 


resource pooling and allocation and IoT network combined with wireless 
applied to crowd management systems [29]. multimedia sensing network. They 
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considered mobile devices as nodes for 
crowd monitoring [38]. A vision-based 
method, to prevent the collision of agents in 
simulation environment, was proposed by 
[39]. By using cognitive science, they also 
predicted the upcoming collision among 
agents. By using navigation fields, authors 
presented a method to direct the agents in 
the simulation environment to monitor the 
virtual crowds [40]. They proposed human- 
like intelligent agents model governed by 
the rules of fuzzy-logic that model, 
simulates and visualizes crowd dynamics 
applicable to emergency situations [41]. 
The researchers studied that heuristics have 
multiple roles in crowd management 
including crowd recognition, tracking, 
congestion etc. [42]. 


Literature review shows that multiple 
solutions based on simulation and 
visualization methods have been proposed 
for monitoring and evacuating individuals 
during any critical situation in large crowds. 
Some techniques involved real time crowd 
monitoring or using Wi-Fi/Bluetooth for 
extracting user information or data 
generated by simulator for simulation and 
visualization of crowds. Certain limitations 
were found in the existing systems as some 
were tested on limited datasets and were not 
scalable for large crowds. Similarly, most 
of them did not discuss the context before 
and after any incident, and on the other 
hand, some discussed only crowd 
simulation. The current research presented a 
framework capable of modeling, 
simulation, and visualization of large 
crowds based on distributed approaches to 
monitor the movement and interaction of 
individuals in large crowds through 
localized sensing and geo-spatial analysis 
of massive positional data. 


HI. PRELIMINARY 


The applicable entities have been classified 
as modeling layers. The individual model 
decides what an individual should do 
provided the status of the entities around it. 
The context layer approach smears the 
relevant context of an individual, whereas 
the venue model gives information of 
surroundings as shown in Figure 1. 


A, INDIVIDUAL 


Individuals in a crowd were proposed to be 
regarded as the agents in a world. Let’s 
denote the person as pl, p2, p3............ pn. 
Every individual has some attributes and 
behaviors with respect to their identities and 
places. Let’s denote the identity of a person 
with unique id as idl, id2, id3 .......... idm. 
As every person has unique identity, let’s 
denote the person as referred in (1). 


pl.idl, p2.id2, p3.id3 oe. pn.i (1) 


Attributes exist at more than one level. It 
starts with a basic level of distinctiveness, 
for instance, name, age, and gender. On the 


next level, are some _ crowd-related 
attributes, for instance, location and 
velocity. These multi-level attributes 


collectively form a profile that helps in 
identifying the individuals in a large crowd. 
Some general behaviors are associated with 
the type of agents, while others are specific 
to tasks and places. Many related actions 
are triggered when an individual performs a 
certain task. 


B. VENUE 


Large crowds occur due to an event 
happening at a specific venue which is 
divided into regions. Let’s suppose the 
region as rl, 12, T3, seess rk. 
Furthermore, the region is divided into 
zones and pathways, respectively. Let’s 
denote the zone in the region as z1, z2, z3, 
shewnateness zs. As one region can have many 
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zones, lets denote regions as referred in 


(2), (3), and (4). 


rl=Zlrl, 22.11; ss z5. rl (2) 
T2= 21 12, 22.72; rrn z9. 12 (3) 
rk = zl rk., 22.rk. zs.rk (4) 


It shows that there are multiple zones in a 
region. The venue map is used as a 
background image in simulation calibrated 
with real-word settings to associate pixel 
values with specific places inside the 
simulated environment. The positional 
coordinates of moving individuals are that 
of factual surroundings. An event, in 
general, can hold many sub-events in 
particular. Those sub-events can also be 
defined so that a specific place is created 
for the event and when the sub-event ends, 
the place is removed from simulation, 
respectively. The venue of an event 
generally, has two substances, a point of 
interest for individuals and the available 
path to reach there. 


INDIVIDUAL d 


PED 


Action 


8 


FIGURE 1. System Design 
1) POINT OF INTEREST 


It is considered as any place inside the 
created world which may be an originating 
position for some and a destination site for 
others. Some places remain throughout the 
simulation, while others are occasional. For 
instance, spring festival, in which there are 
bookstalls in a park during daytime and at 
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night the place is transformed into a party 
area. Let’s pix denote the point of interest 
as pil, pi2, pi3 .............pij are different 
points of interest in the scenario where the 
value of x ranges from | to j. 


2) PATHWAYS 


The presence of walls enforces 
constrictions on the area of movement for 
individuals. These walls are created by 
applying an alpha image of background in 
which walls are white where individuals 
cannot walk. Whereas, the rest results in 
pathways and points of interest to facilitate 
the mobility of agents for their movements. 
Let’s pwk denote the pathways as pwl, 
pw2 , pw3 ......... pwy. where the value of k 
ranges between 1 and y. Since, it is a 
known fact that one zone may have many 
pathways, therefore let’s pwi:zj denote the 
pathway i in zone j. Thus, the pathways in 


different zones would be denoted as 
referred in (5), (6), and (7). 
zl =pwil.zl, pw2. zl..........pw5.zl (5) 


Z2 = pwl.z2, pw2. 22;....... pw9. z2 (6) 
ZS = pwl.zs, pw2.ZS; ......... pwy.zs (7) 


IV. PROPOSED ARCHITECTURAL 
FRAMEWORK 


This section explains all the components of 
the proposed framework which follows a 
layered architecture. The layers and 
different components of the proposed 
framework have been presented in Figure 2. 
A distributed positional sensing 
infrastructure is placed at a physical layer 
which is the quantitative movement data 
source. A big data environment, containing 
distributed computing platforms and a data 
ingestion component, accumulates the 
processing layer to systematically preserve 
and process the anticipated massive geo- 
spatial datasets. Multiple visualizations of a 
moving crowd were proposed at the 
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application layer by providing pre-analytics 
and post-analytics visualization modes. 


A. PHYSICAL LAYER 


This layer consists of sensing infrastructure 
which contains devices capable of 
transmitting positional data and crowd 
sensing nodes for the reception of that data. 
Numerous techniques, to sense crowd 
movement by using location-aware devices 
have been suggested, such as RFID 
wrist/ankle bands, and proximity-based 
sensing via Bluetooth [43]. The sensing 
infrastructure operates in a distributed 
fashion as distribution of operations is the 
essence of system design, maintained at 
every layer of the proposed architecture. 


1) CROWD SENSING NODE 


A sensing node is used to capture positional 
updates from location-aware devices. The 
node serves the purpose of collecting the 
data under a certain vicinity and processes 
the local information about crowd 
conditions. The collected information, then 
can further be sent to a nearby participant if 
requested by its connecting device. These 
nodes are placed at strategic junctions 
across the venue to cover the whole event. 
These connected nodes gather crowd 


information, create swarms, and then 
periodically transmit to processing 
infrastructure. 


2) SIMULATION FRAMEWORK 


Simulation can be used to design a 
problematic scenario and the mechanism to 
evaluate the behavior and operation of the 
opted solution or system. Crowd simulation 
attempts to give an appropriate abstraction 
of this domain. Therefore, another aspect of 
the current research was to simulate large 
events along with the usual amount of 
crowd they comprehend. Large events 
cannot be recreated to deploy and test the 
actual viability of the proposed crowd 


monitoring system at such a scale. 
However, possible scenarios in a large 
event if simulated and analyzed beforehand 
can be useful in devising the appropriate 
strategies to efficiently manage large 
crowds on the day of the event. 


B. PROCESSING LAYER 


The processing layer is composed of 
several interdependent components. At the 
bottom of this layer is the data ingestion 
and mapping component which spans from 
the collection of data to its filtration and 
modeling. An important component in this 
layer is the big data environment, 
specifically intended to incorporate the 
composition of several distributed 
processing frameworks. The layer is 
designed in a way that the incorporation of 
heterogeneous elements below and above 
this layer would not affect the transparency 
of its operations. Each component of the 
layer is described in detail below. 


1) DATA COLLECTION SERVICE 


A data collecting service is induced at the 
bottom of the processing layer which is 
responsible for ingesting the data from 
physical layer or simulator into the system. 
The service is generic to ensure that the 
system is not affected by the heterogeneity 
of devices in a physical sensing 
infrastructure or the operational difference 
of distributed processing frameworks. 


2) DATA FILTRATION 


The collected data needs filtration on 
certain parameters to reduce the workload 
of both storage and processing. Therefore, 
essential features are extracted to uniquely 
identify the individuals and analyze crowd 
conditions. Rest of the data is filtered 
accordingly. 
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3) DATA MODELLING 


The data is modeled for identification of 
individuals. It is event-specific based on 
unique ID, group, nationality, age, gender, 
participants’ location, and timestamp, as 
shown in Figure 3. Data modeling is also 
subjective to the nature of large events as 
well as the movement of the individual is 
tied with the movement of groups which 
leads to identify the participants in a large 
crowd. 


4) BIG DATA ENVIRONMENT 


The influx of the data from physical layer is 
expected to be huge in terms of volume and 
velocity. To process millions of records 
from the simulator and real-time, relational 
data stores and warehouses have limitations 
in latency and scalability. Therefore, big 
data processing frameworks were proposed 


Crowd Movement Visualization 


to be included in the system to be the 
driving forces behind the extraction of 
value out of that humongous data. 


The environment contains MongoDB, a 
robust and scalable Non-SQL storage, and 
Spark, an in-memory cluster computing 
framework for lighting fast analytics. They 
together give a vigorous combination for 
real-time analytics. Although, MongoDB 
itself has a wide array of query operations 
that are efficient to work on a huge volume 
of locational data. However, the inclusion 
of Spark enhances the capabilities of the 
environment as different levels of crowd 
information can be maintained by using 
Spark for rich analytics at both crowd and 
individual level. These frameworks use 
multiple machines to manipulate the data 
through distributed computing. 


Crowd Dynamics 
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FIGURE 2. Proposed Architecture Framework 
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5) REAL TIME ANALYTICS 


The current study intended to enhance and 
ensure the safety of individuals involved in 
an immensely crowded event. Therefore, 
information on crowd situations in real-time 
is of utmost importance. The data collected 
by the system is analyzed and processed 
concurrently to deduce the required output 
and have an insight into the crowd situation. 


6) REAL TIME ANALYTICS 


Stampedes and crushes can be avoided if 
crowd managers and authorities are 
proactive in terms of devising strategies to 
deal with critical situations. The imminent 
knowledge of crowd conditions attained 
through predictive analytics can help them 
to a significant extent. These predictions are 
formulated by analyzing the previous data 
and making models out of that data. 


Unique ID 
Name 
Age 
Gender 
Nationality 
Group ID 
Group Role 
Location 
(Latitude, Longitude) 


Timestamp 


FIGURE 3. Participant’s Data Model 
C. APPLICATION LAYER 


The purpose of setting up the processing 
infrastructure is to build applications, which 
enables the crowd modelers and crowd 


observers to effectively manage large 
events, on top of it. The authorities can 
visualize crowd movement and formulated 
analytics to have an insight on rapidly 
changing conditions as shown in Figure 4. 


1) PRE-ANALYTICS 
VISUALIZATIONS 


The portrayal of the on-ground crowd 
condition is important for the assessment of 
the situation before analysis. Pre-analytics 
visualizations are, therefore, provided for 
the thorough monitoring of crowd 
movement at large. The movement of 
individuals, in a real crowd, can be 
observed as moving agents in a simulated 
environment. The context of every 
individual is changed to its position, point 
of heading, or on-going activity. Therefore, 
information on the participant’s context is 


provided in a way that monitoring 
authorities can specifically track a 
participant. 

2) POST-ANALYTICS 
VISUALIZATIONS 


The visualizations formulated after the 
transformation and processing of 
individuals’ positional data are of extreme 
importance to monitor and organize 
massive crowded events. These 
visualizations are formed by deducing the 
information on different crowd parameters, 
such as crowd density and crowd flow. The 
articulated graphics are relative to the 
population of participants under a certain 
vicinity over various time frames. 
Heatmaps generated on crowd density can 
give a bird’s eye view of the crowd 
situation and perhaps allow crowd 
monitoring authorities to easily determine 
the congested and free-flowing areas in 
large events. 
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Stimulated Crowd Movement 


Real Crowd Movement 


FIGURE 4. Crowd Movement 
V. IMPLEMENTATION DETAILS 


A proof-of-concept cycle of the pro- 
posed framework has been presented in the 
current study. The development work 
includes simulation of large events, real- 
time analytics on positional data of 
individuals in those events, and multi- 
dimensional visualizations on crowd 
conditions. Java is used as a development 
language at every layer of the architecture 
as comprehensive APIs of the opted 
frameworks were available in Java. 


A. PHYSICAL LAYER 


JAVA based simulator has been used at the 
physical layer, however, the proposed 
architecture suggests the placement of 
distributed sensing infrastructure to gather 
the positional data. 


1) LOCATION AWARE DEVICES 


Numerous devices exist which are capable 
of acting as location-aware gadgets, for 
instance smartphones, smart devices, such 
as GPS wrist/ankle bands, smartwatches, 
proximity-based sensing via 
Bluetooth/WLAN scanners, and infrared- 
based human presence sensors as shown in 
Figure 5. 
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FIGURE 5. Location Aware Devices 


Therefore, information on _participant’s 
context is provided in a way that 
monitoring authorities can specifically track 
a participant. 


2) DATA COLLECTION NODES 


A data collection node is used to capture the 
positional updates from location-aware 
devices. In notion of large events, a data 
collection node serves the purpose of 
collecting the data under a certain vicinity 
and processes some local information about 
crowd condition. The collected information 
can further be sent to a nearby participant if 
requested by its connecting device. These 
nodes are placed at different junctions and 
distributed across the venue to cover the 
whole event. 


3) SIAFU TOOL FOR SIMULATION 


For the simulation of large crowds, a 
comprehensive tool named Siafu is used 
which is a multi-agent system. The 
simulation includes agents, the context 
therein. Siafu can generate its context . 
Moreover, simultaneously it can also 
augment real context coming from the 
sensors. Simulation serves as a means to 
design a problematic scenario and the 
mechanism to evaluate the behavior and 
operation of the opted solution or system. 
Crowd simulation attempts to give an 
appropriate abstraction of this domain. 
Therefore, the other aspect of the proposed 
framework is to simulate large events along 
with the usual amount of crowd they 
comprehend. Firstly, large events cannot be 
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recreated to deploy and test the proposed 
crowd monitoring system’s actual viability 
at such a scale. Moreover, possible 
scenarios in a large event if simulated and 
analyzed beforehand can be useful in 
devising the appropriate strategies to 
efficiently manage a large crowd on the day 
of the event. 


A generic context simulator was used [44] 
for the basic modeling of entities in a 
scenario. The simulator itself is broad in 
terms of information sources, however, it is 
limited in terms of event-driven activities 
and behaviors. The concept was extended 
towards immensely crowded events and 
activities of individuals that it 
comprehends. The simulator works as a 
Multi-Agent System (MAS). For the 
realistic simulation of such events, MAS 
approach is considered suitable as the 
agents are expected to move to their goals, 
interact with their environment, and 
respond to each other. MAS is also 
postulated as a preferred approach for 
emergency evacuation simulations [3], 
because it models the problems in terms of 
autonomous interacting component-agents, 
which is proving to be a more natural way 
of simulating the unpredictability factor of 
large crowds. The simulator can generate its 
own context and at the same time can also 
augment the context coming from real-time 
data sensors. The simulated world includes 
models for agents, the world, and the 
context therein. In order to create random 
participants, every participant is assigned a 
unique ID, group ID, age, nationality, and 
activity. The gender of the participant is 
assigned colors. Moreover, color is also 
assigned to the group leader as explained in 
algorithm I. To create random population 
world w, the population count is assigned a 
variable C, group limit, and group check is 
assigned variable L and G, respectively. 
After iteration, random agents are created 


and the list is returned as explained in the 
algorithm II. 


Algorithm: CREATE RANDOM PARTICIPANT (World w, C, G) 


1: let P be the object of participant's class 

2: let R be the random number between -2 and 1 
3: let COLOR be the participant's color 

4: let L be the group limit 


5: let C be the population count 

6: P.set(ID, unique ID) 

7; P.set(GROUP ID, G_ID) 

8: P.set(AGE, Call Random(Age)) 

9; P.set(NATIONALITY, Call Random(NationalityList)) 
40: P.set(ACTIVITY, Call Random(Activity)) 
44: if R=0 then 

12: COLOR := Blue 

13: P.set(GENDER, male) 

14: else 

15: COLOR := Brown 

16: P.set(GENDER, female) 

17; ifC %L=0then 

18; COLOR := Magenta 

19: P.set(ROLE, leader) 

20: else 

24: P.set(ROLE, normal) 


Algorithm I. Create Random Participants 


MongoDB Java driver is selected as Java 
the development language. It provides both, 
synchronous and asynchronous interaction 
with the application. Four shared 
collections are created to meet the required 
functionality of the system. Collections 
with their primary function are shown in 
Table I. Spherical GeoJSON objects are 
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used to store the locational data. The 
participant’s position is stored as a point, 
while zone bounds are saved as a polygon. 
Every participant is assigned a unique ID, 
group ID, and role object of MongoDB for 
participant role. By using MongoDB query, 
every participant with a unique ID, group 
ID, role, location, and coordinates of the 
participants are inserted in DB as shown in 
Figure. 6 to continuously update the 
information of individuals by which every 
individual can be monitored and located 
based on his/her trial. An inclusion 
geospatial query, to find points confined in 
a polygon, is executed which eventually 
returns the number of participants in that 
defined region. MongoDB creates a unique 
index on ID field by default upon the 
creation of a collection. A geo-spatial index 
is created on the location field of a record 
and a single field index on timestamp. 
These indexes collectively increased the 
performance of queries. Spark’s Java API is 


used to write scripts to analyze the 
operational data managed by MongoDB. 


Algorithm - I : CREATE RANDOM POPULATION (World w) 


1: let C be the population count 

2: let P be the list of participants with size C 
3: let L be the group limit 

4: let G be the group check 

5: G:=1 

6: Fori=1toCdo 

7: ifi%L=Othen 

8: INCREMENT G 

9: Endif 

10: P.add{ CALL CREATE RANDOM AGENT(w ‚i, G)) 
4141: End For 

12: RETURNP 


Algorithm II. Create Random Population 


TABLE I 
SHARED COLLECTION 
Collection Primary Function Frequency of Use 
Paano To store participant’ data The records are inserted 


record. 


Zone 


To store crowd parametric 
information with respect to 
zone or elsewise 


Real-Time Analytics 


Predictive Analytics 


continuously. 


The records are inserted and 


To store information on zones. retrieved for zone based 


crowd analysis. 


The results are stored and 
retrieved continuously. 


To store predictive informationThe results are updated after a 
on crowd situation. 


certain time 


B. APPLICATION LAYER 


The application layer plays an important 
role in terms of modeling and monitoring of 
large events. Initially, a desktop application 
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is developed which intended to capitalize 
the processing infrastructure and to give 
crowd modelers and authorities a platform 
in order to model and monitor the crowd 
movement at large. 
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C. CROWD ANALYTICS 
APPLICATION 


This application is developed by using 
JavaFX to build most of the extensive chart 
libraries. The application is categorized into 
three modes. Each mode is designed and 
developed to give a specified set of 
functionalities. The whole application is 
built on a customized framework of screens 
to incorporate modifications and 
adjustments with ease. 


GeolSONPom:? S A] 


"geometry": { 
"type" 
"coordini 
} 


: [lat, lon] 


| GeoJSON (Polygon): 


{ lat , lon] , [ lat, lon] 


{ lat, lon }<1* Point>,[ lat , lon ] 
] <3* Point>, [ lat, lon ]<4™Point>, [ lat, 


FIGURE 6. GeoJSON Objects and Query 
1) DEPLOYMENT MODE 


The application allows the modeler to 
generate resources that are required in the 
simulation. The modeler can specify 
coordinates of the venue to get a real map 
to be used as a background in the 
simulation. Pathways and points of interest 
can be created with just a mere click. 
Moreover, the modeler can drag and drop a 
polygon over the venue map to categorize 
the venue into different zones. The 
polygonal bounds attained in pixels are 
converted into positional coordinates, 
provided that bounds of the venue are 
defined. 


2) SIMULATION MODE 


Few batch scripts are created to resolve the 
dependencies involved in the working of a 
simulation. These scripts are triggered and 
run in the background while using the 
functionalities given by the simulation 
mode. The simulation model is used to 
update the simulation by loading the newly 
created resources created in the deployment 
mode. The observer can start the simulation 
once it is laden. 


3) ANALYTIC MODE 


This mode is developed for the monitoring 
of crowds. Real-time information, on 
several crowd parameters, can be observed. 
The crowd observer can also get hold of the 
historical crowd conditions. Moreover, 
zone level analysis can also be articulated 
to monitor the individual’s trial in case of 
any critical misadventure or a violent 
attack. Individuals existing at the place of 
the crime scene before or after the crime 
can be tracked from their trial. It is easier to 
determine a critical situation when 
extracted information from the data is 
displayed in the form of graphs, maps, and 
alerts are generated for the capacity 
information. Zone on the image is assigned 
values and the value of color is assigned by 
a variable C as presented in algorithm III. 


Algorithm - II: HEAT MAP GENERATION (Overlay O) 


= 


: let TX be the top left horizontal point of zone on image 


N 


: let TY be the top left vertical point of zone on image 


wo 


: let BX be the bottom right horizontal point of zone on image 


4: let BY be the bottom right vertical point of zone on image 
5: let C be the color value 

6: FOR i = TX to BX do 

7: FOR j= TY to BY do 

8 O.setPixelValue(i j,C) 

9; END FOR 

10: END FOR 


Algorithm II: Heat Map Generation 
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4) CROWD INFORMATION CHARTS 


The information attained through analytics 
on positional data is portrayed in the form 
of charts given in Figure 7. A dynamic line 
graph, illustrating real-time crowd flow, is 
generated. Moreover, a static line graph, to 
show historical crowd flow with respect to 
recorded timestamps, is also formulated. 
For crowd information charts, zone area is 
calculated as shown in the algorithm IV. 


Algorithm - IV: ZONE AREA CALCULATION 


: let TL be the top left point of zone 
: let BR be the bottom right point of zone 
: let R be the earth’s radius 


: let PI be the value of pi 


1 

2 

3 

4: let Area be the zone area in square units 

5 

6: Area = (PI/180) * (R*R) * SIN (PI/180*TL. latitude) 


— SIN (PI/180*BR.latitude)*(TL. longitude - BR.longitude) 


Algorithm IV: Zone Area Calculation 


peel ee Oa 


FIGURE 7. Zone Population Over Time 


5) HEATMAPS 


Heatmaps are based on crowd density of a 
specific zone. Values are associated with 
five colors, that is, blue, cyan, green, 
yellow, and red. Red is the immensely 
populated area as shown in Figure 8. 
Heatmaps depicting real time crowd 
information are rendered in every iteration 
of the 


Simulation. Whereas, predictive heatmaps 
are rendered after one hour to forecast the 
next hour’s conditions. The results of 
predictive analytics attained through Spark 
are stored in a specified collection of 
MongoDB which is queried for the 
generation of a predictive heatmap 
according to the number of participants in a 
zone as shown in Figure 9. 


t t t t t 
Bare Normal Significant High 


FIGURE 8. Color Scheme to Illustrate 
Crowd Density 


FIGURE 9. Heatmaps Depicting Crowd Density in Two Different Zones 
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6) ALERT GENERATION 


With the generation of heatmaps for crowd 
analysis, alerts are also being generated 
based on crowd density in a zone, as zone 
information is updated continuously in 
shared collection. The system is designed in 
such a way that the modeler can set the 
limits of participants in defined zones and 
easily manage and monitor the crowd. If the 
capacity of the moving crowd (number of 
the participants) approaches to maximum 
limit in a zone then, an alert would also be 
generated with the heatmap system as 
shown in Figure 10. Additionally, by using 
the breadth first search technique, the 
system suggests to modeler to redirect the 
crowd towards nearby zones having 
maximum empty space (available capacity). 
For instance, zone capacity is 5000 
participants, when it reaches to 4000 it 
would generate an alert that zone capacity 
is approaching to its maximum. When it 
reaches to 5000 approximately, then a No 
Entry alert would be generated which 
would stop the crowd outside the zone. 


Warning! 


Zone 7 is 90% filled, please redirect crowd to Zone 4 or Zone 5 


FIGURE 10. Alert Showing Crowd 
Capacity Warning 


VI. CASE STUDY: HAJJ 


as 90% to 80% space is available resepectively 


The venue holding large crowds contains 
possible random events related to time and 
place that can be known and unknown. 
Situational contexts are regarded, such as 
location, environment, or state of the agent 
is dependent on the situation. The repertoire 
of their behavior relies on the dynamicity of 


an event. Hajj is the annual pilgrimage of 
Muslims to Makkah, Saudi Arabia. This 
pilgrimage contains more than 3 million 
Muslim participants from around the globe 
and is considered as one of the large mass 
gathering. It is conducted every year in the 
last month of the Islamic calendar (Dhu-al- 
Hijjah) on specified dates, that is, from 8"- 
13", Receiving, managing, and monitoring 
this crowd with cameras and manually is 
not feasible, as all the Islamic religious 
activities of Hajj, named as rituals are 
required to be performed simultaneously 
and in the same place for all crowd. On one 
side, the organizers are struggling to 
conduct the event without the occurrence of 
any accidents or critical incidents, such as 
stampedes, fire, or medical emergencies. 
On the other side, the participants and their 
local/native organizers struggle to keep 
their groups combined, avoid dispersing 
their groups during crowded rituals, finding 
the lost, and missing ones from their group. 


The proposed system will be connected 
with a command and control center to share 
the information of any disaster or mishap 
with the authorities and official relevant 
departments including rescue teams, police, 
army, and fire brigade to start the rescue 
operation. Since, the number of pilgrims 
keep on increasing every year and to model 
the event, some related activities of 
individuals and places have been defined to 
store the information of individuals for 
finding, tracking, and trailing every 
individual and group leader. A pilgrim’s 
unique profile comprises of their name, ID, 
gender, origin, maktab, and location. Hajj 
has many sub-events, such as Situational 
Context Tawaf, Rami, Halaq Analytics 
Flow, and turbulence density. Therefore, 
Hajj areas have been divided into regions 
which are further divided into multiple 
zones as shown in Figure 11. 
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FIGURE 11. Sample Zone Division of Arafat Plain (Makkah) 


A, ANALYTIC 


Productive information on the movement 
and influx of large crowd, in a simulated 
scenario, is attained by performing analytics 
on parameters at both individual (agent) and 
crowd (multi-agent) level. Different 
functions are provided to analyze the 
movements of individuals as shown in 
Figure 12. Analytics can further be utilized 


Sirris =] 
- 
e 


to estimate and visualize the possible 
outcomes. For instance, an increase in 
crowd density at the junction of less 
capacity may result in an unfortunate event, 
that is, stampede. Therefore, efficient 
analytic plays an important role in seeing 
ahead of the ongoing state of affairs. The 
predicted upshot can also be communicated 
within individuals in a crowd for situation 
awareness and alertness. 


ENARIO - Si _ Ea 


DEC 30, 2020 
10:24 AM 


FIGURE 12. Movement of Agents in Simulator 


B. VISUALIZATION 


Visualization is an important aspect of 
simulation since it is the form of graphical 
communication. The interactive interface 
allows users to analyze the status of an 
individual with respect to different levels of 
information. Textual information is not 
sufficient at times to realize the analytical 
spectrum of a situation. For that purpose, 


context overlays and heat maps are 
provided to graphically signify the 
surroundings. 
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VII. RESULTS AND 
EXPERIMENTAL SETUP 


A. DATA SETS 


The data generated by simulator is used to 
conduct the experiments. Three different 
sets of operational data are selected based 
on crowd population, crowd observation 
time, and data collection time difference for 
each set as shown in Table II. 
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TABLE II 
OPERATIONAL DATA 
Parameters Data- I Data II Data III 
Crowd Population 10,000 50,000 100,000 
ip we servation 180 Min 180 Min 180 Min 
Time 
Data collection Time 3 Min 3 Min 3 Min 


Difference 


10000 x 180/3 = 
600,000 


B. TESTING SCENARIOS 


The proposed framework has been 
evaluated on following three frequently 
executing scenarios covering different 
aspects of crowd monitoring, management, 
and analytics. To this end, different 
variations have been incorporated in the 
volumes of the data and indexes. 


1) SCENARIO-I 


No. of Records 


The first scenario aims to calculate the 
number of people in a zone at a given point 
in time. This, in turn, can help in calculating 
the total population of a region. 


2) SCENARIO-II 


The second scenario is a very common 
problem during the Hajj and helps finding 
the lost individuals, as there are a large 
number of old-age personnel performing 
Hajj and they get disconnected from their 
caretakers very frequently due to many 
reasons. 


50000 x 180/3 = 
3,000,000 


3) SCENARIO-II 


100,000 x 180/3 = 
6,000,000 


The third scenario is related to data 
analytics. It intends to find people near the 
place of any incident at the time of that 
incident. This is a useful scenario that helps 
in conducting the investigation of an 
incident. 


C. EVALUATION 


The performance of an index is evaluated 
on three different volumes of data. The 
quantum of positional data varies with 
respect to the crowd population. The 
performance of a query with or without an 
index is measured with respect to its 
execution time. 


D. OPERATIONAL DATA 


Queries execution times are given in Table 
Il. 


TABLE II 
QUERY EXECUTION TIME 
Query-I Parameter (Time Milliseconds) Data-I Data-II Data-III 
Time (without Index) 550 3093 6484 
Time (with Index) 53 788 1230 
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Query-II Parameter (Time Milliseconds) Data-I Data-II Data-III 
Time (without Index) 6123 23587 52138 
Time (with Index) 

Query-II Parameter (Time Milliseconds) Data-I Data-II Data-III 
Time (without Index) 312 1596 3303 
Time (with Index) 1 1 1 


1) SCENARIO-I CALCULATING ZONE 
POPULATION 


The compound query incorporating this 
scenario was executed against different 
sizes of positional data generated by the 
simulator to calculate the number of 
participants present in a specified zone to 
ensure the number of participants is not 
exceeding the zone’s defined capacity. 
Since, the number of participants exceeds 
the defined capacity, chances of stampede 
and crushes increase as well. The 
performance of a query was measured with 
respect to its execution time. Figure 13 


shows that the execution of query took 550 
milliseconds without an index, whereas 53 
milliseconds with a compound index for 
data set I. However, for data set II, it took 
3093 milliseconds without an index, and 
788 milliseconds with a compound index. 
While, for data set III, it took 6484 
milliseconds without an index and 1230 
milliseconds with a compound index to 
execute. The line graph presented in Figure 
14 shows that without the compound index, 
the query response time grows 
exponentially, whereas with compound 
index it grows linearly. 


Performance Evaluation Query - | 
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FIGURE 13. Query-I Participants Present in a Specified Region Within a Certain Time 


Frame 
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FIGURE 14. Query-I Participants Present in a Specified Region Within a Certain Time 


Frame 


2) SCENARIO-IT LOCATE A LOST 
INDIVIDUAL DURING HAJJ 


A compound query, incorporating scenario 
II, was executed against different sizes of 
positional data generated by the simulator. 
It intends to locate the lost individual from 
his/her current position and through the trial 
of the individual’s movement. Since, it is 
almost impossible to locate a lost individual 
in such a large crowd. The performance of a 
query is measured with respect to its 
execution time and the evaluation results 
are presented in Figure 15 and 16, 
respectively. The query execution time for 
Data-I is 312 milliseconds without an index 


and 53 milliseconds with a single field 
index. Whereas, for Data-II, it took 1596 
milliseconds without an index and 23 
milliseconds with a single field index. 
While, with Data III, it took 1596 
milliseconds without index and 11 
milliseconds with a single field index. It is 
important to mention that the data model is 
designed carefully to accommodate this use 
case. As a matter of fact, each individual’s 
trail is stored in a separate file along with 
relevant meta information. Thus, finding an 
individual’s trail and current location are 
possible very quickly with the help of 
index. 
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FIGURE 15. Query-II Participants Present at a Certain Distance from a Specified Point 
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FIGURE 16. Query-II Participants Present at a Certain Distance From a Specified Point 


3) SCENARIO-IT PARTICIPANTS 
PRESENT AT A CERTAIN DISTANCE 
FROM THE PLACE OF THE INCIDENT 
AT THE TIME OF THE EVENT 


A compound query, incorporating scenario 
IMI, was executed against different sizes of 
positional data generated by the simulator 
to locate the individuals and their distance 
from specified points, such as boundary 
walls, pathways, entry, and exit points 
defined in the region. This query helps to 
monitor the individuals present around the 
place of incident at the time when the event 
took place. It is pertinent to note that such 
queries will be executed on analytical 
engine and not on real-time monitoring site. 
The performance of a query is measured 
with respect to its execution time and the 
evaluation results are presented in Figure 17 
and 18, respectively. Figure 17 shows that 
for data I, the query execution time was 
6123 milliseconds without an index, 
whereas 2347 milliseconds with a geo- 
spatial index. For data II, it took 23,587 
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milliseconds without an index and 7800 
milliseconds with a geospatial index. 
While, for data II it took 52,138 
milliseconds without an index and 18,022 
milliseconds with a geospatial index point. 
On the other hand, it can be observed from 
Figure 18 that the execution time without 
indexing is approaching to exponential 
trend. While, the execution time with index 
is slightly higher than the linear behavior. It 
is pertinent to mention here that this query 
is not meant for live data monitoring, rather 
it will be executed on analytical engine for 
post-incident investigations. The 
performance of a query is measured with 
respect to its execution time and the 
evaluation results are presented in Figure 17 
and 18, respectively. Data I includes 
positional data of 10,000 participants 
observed with a time difference of 3 mins 
resulting in 0.6 million records. In this case, 
the query took 312 milliseconds without an 
index, whereas 53 milliseconds with a 
single field index to execute. 
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FIGURE 17. Query-III Trail of a Participant 
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FIGURE 18. Query-III Trail of a Participant 


VIII. CONCLUSION 


A big data resource framework has been 
presented in the current study to store and 
analyze positional data of large crowds. 
This framework allows the crowd modelers 
to model large events and enables them to: 
a) design event-centric simulation by using 
a generic context simulator to incorporate 
large events; and b) to crowd monitoring 
authorities in order to analyze current 
crowd conditions to prevent any 
problematic situation. A proof of concept 
application has also been implemented to 
model and simulate large events, analyze, 
and visualize crowd dynamics from 
different aspects. Whereby, heatmaps and 


graphs have been employed to visualize the 
crowd movement and density in different 
zones and regions at a given point in time. 
The proposed processing infrastructure 
comprises of a) MongoDB a distributed 
data store to store the data and process 
geospatial queries; b) Spark a distributed 
processing framework for performing real- 
time and predictive analytics; c) a data 
ingestion component to facilitate pre- 
processing of data utilizing filtration and 
modeling; and d) a simulation component 
to generate data. The efficiency of the 
proposed framework was evaluated on 
simulated data by executing geospatial 
queries for some frequently occurring 
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scenarios including visualizing crowd 
density, locating an individual, and finding 
people near the place of an already occurred 
event. To this end, the impact of indexes on 
different queries has also been assessed, 
which considers their frequent expected 
usage. 


In future, this work can be extended further 
by implementing its physical layer to gather 
real data. Whereby, the operations of the 
developed infrastructure would be assessed 
as well on a real event in order to 
practically evaluate the efficacies and 
limitations of this system. 
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